
***************************************
Author: Avi Ebenstein/Shannon Phillips
Date  : December 2010
***************************************

1. morg_data.do. Take the CEPR MORG data (which is cleaned NBER data)
   and perform some basic edits such as assignin the Autor data to
   the MORG workers. I also assign the workers man7090. This creates
   the critical files, including morg_data.dta which has all workers
   and morg_man7090.dta which has all manufacturing workers.

2. trade_data.do. Creates a data set with the trade data at the
   3-digit SIC level using the CEW 4-digit SIC employment as 
   weights. 
   This requires me to first recode the older CEW data as SIC87
   using the NBER concordance. Then I aggregate up to 3-digit SIC
   and ultimately to man7090. man7090 is a consistent industry 
   definition that is applied to the CPS MORG sample.
   This is saved as trade_man7090.dta.

3. cew_75_00.do. Brings in the CEW data from the BLS raw data. These
   data contain employment and wage info at the 4-digit SIC code level
   for 1975-2000.

4. cew_data.do. Use the direct 3-digit SIC codes for 75-87 and then 
   88-00 to recode the SIC 72 & SIC 87 into man7090. 
   This is saved as cew_man7090.dta.
   Note: first run cew_75_00.do.

5. computeruse.do. Data cps1-mf.dta (Computer Use Rates by Industry x Year)
   and oct8497means-gen.dta (Computer Use Rates by Occupation x Year) were 
   provided by David Autor and are used in Autor et al. (1998): 
   oct8497means-gen.dta is merged with cpsaug00, the CPS computer use 
   supplement file. We calculate average industry and occupation computer use 
   rates from the October 1984, 1989, 1993, 1997, and August 2000 CPS Internet 
   and Computer Use Supplement files. The use rate within an industry or occupation 
   is the fraction of respondents who report using a computer or the internet at work. 
   We impute computer use rates for the remaining years by linearizing the percent change
   within an occupation or industry between available years. We use the linear trend 
   from 1984-1989 to impute computer use rates for 1982-1983, setting a lower bound of 0%. 
   We use the linear trend from 1997-2000 to impute computer use rates for 2001-2002, 
   setting an upper bound of 100%.  Creates compuse_occ8090.dta, compuse_ind7090.dta.
   
6. concordance.do. Reads in 4-digit SIC codes and 5-digit SITC codes for
   imports (conimp89_01.asc) and exports (conexp89_01.asc). Truncates 
   SITC codes to the 1-digit, 2-digit, and 3-digit level, for the purpose 
   of getting export and import prices at these levels. Reads in export 
   and import prices (ei_data_3_SITC.raw), formats sitc1, sitc2, and sitc3
   into sitc. Truncates 4-digit SIC codes to 3-digit SIC codes (SIC87) for 
   the purpose of merging with sic87-3_man7090.dta to get codes at man7090
   level. Makes 3-digit SIC codes consistent within 1-digit, 2-digit, and 
   3-digit SITC categories and saves export and import prices in datasets
   1digit.dta, 2digit.dta, and 3digit.dta. 

7.  match_madrian.do. Construct a panel data set from the CPS MORG by matching
    individuals surveyed in two consecutive years between 1983 and 2002. 
    Each month, one quarter of survey respondents are asked questions about their 
    job including wage, occupation and industry. One year later, half of these 
    individuals are again asked these same job-related questions. We use Madrian 
    and Lefgren's (2000) matching algorithm to first match an individual based on their
    household identifier, household number and individual line number. Based on this naive 
    match criteria, a high non-matching rate results, as survey respondents who move out 
    of a housing unit are replaced in the sample by those who move in and given the same 
    unique identifiers, as well as nonresponse, mortality and migration. A naive match 
    is then dropped if it does not match on sex, race or age criterion. Based on these 
    criteria, the match rate is 50%, with 871,917 individuals matched, or 1,743,834 
    observations out of a possible 3,481,692 observations. 			 	        				   		
    Creates match_madrian_long.dta, match_madrian_wide.dta, and switchers.dta.

8.  merge_data_educ_man7090. Merge the trade, CEW, BEA, and premium info. 
    Creates merge_educ_man7090.dta. Industry x education category x year level.

9. merge_data_ind7090.do. Merge the trade, CEW, BEA, and premium info. 
   Creates merge_micro_ind7090.dta. Micro data, all industries.

10. merge_data_man7090.do. Merge the trade, CEW, BEA, and premium info. 
    Creates merge_micro_man7090.dta. Micro data, only mfg industries.

11. offshore_exposure.do. Micro data, variables weighted by effective
    exposure to each measure of trade or offshoring, using occupation/industry 
    weights. 
    Prepare the BEA offshore employment data, import/export data, share of imports 
    from low wage countries data to be organized by ind7090, price of imports and 
    exports to be organized by ind7090 at the 1-, 2-, and 3-digit level 
    Create vector of trade/offshore variables by year for all industries (wide), 
    vector of employment weights by occupation, share of occupation's employment 
    in each ind7090 industry. Take the weighted sum of emp_share * trade/offshore 
    variables to assign "effective exposure" by occupation. Focus on industries 
    in ind7090 which are in manufacturing (and have non-zero trade/offshore).
    Save matrix of year X occupation level of effective exposure to each measure of trade/offshore.
    Use the workers and assign them these new measures of trade/offshore.
    Creates offshore_exposure.dta.

12. premium_educ_man7090.do. Prepares the MORG manufacturing workers for 
    regressions by merging morg_man7090.dta with the trade and cew and
    collapsing by year, industry, and education. 
    data. The dimension of this data set is 68 industries X 24 years X
    5 education categories = 1,632*5 - missing observations=8,012. 
    This creates merge_educ_man7090.dta.

13. premium_ind7090.do. Prepares the MORG all-sector workers for 
    regressions by calculating wage premia. The dimension of
    premium_ind7090.dta is 195 industries X 24 years - some missing =4,659 observations. 
    This creates micro_ind7090.dta.
 
14. premium_man7090.do. Prepares the MORG manufacturing workers for 
    regressions by calculating wage premia. The dimension of
    premium_man7090.dta is 68 industries X 24 years = 1,632 observations. 
    This creates micro_man7090.dta.

15. premium_occ8090_all.do. Reads in all workers from morg_data.dta
    and creates wage residuals at the occupation x year level, for all 
    workers, skilled (some college or more) and unskilled (high school degree
    or less) workers. Saved as premium_occ8090_all.dta.

16. premium_occ8090_mfg.do. Reads in manufacturing workers from morg_man7090.dta
    and creates wage residuals at the occupation x year level, for all mfg
    workers, skilled (some college or more) and unskilled (high school degree
    or less) workers. Saved as premium_occ8090_mfg.dta.

17. premium_prices_1digit.do. Merges 1-digit SIC level export and import 
    prices now at the man7090 x year level (man7090_sic1.dta) and 
    wage premiums at the occupation x year level (premium_occ8090.dta) 
    with manufacturing workers (micro_man7090). Collapse the file into 
    an occupation x year level dataset (premium_prices1dig_occ8090.dta)
    and a man7090 x year level dataset (premium_prices1dig_man7090.dta).
    Once merge man7090 x year level dataset with occupation x year level 
    dataset, can no longer look at by sic3dig.

18. premium_prices_2digit.do. Merges 2-digit SIC level export and import 
    prices now at the man7090 x year level (man7090_sic2.dta) and 
    wage premiums at the occupation x year level (premium_occ8090.dta) 
    with manufacturing workers (micro_man7090). Collapse the file into 
    an occupation x year level dataset (premium_prices2dig_occ8090.dta)
    and a man7090 x year level dataset (premium_prices2dig_man7090.dta).

19. premium_prices_3digit.do. Merges 3-digit SIC level export and import 
    prices now at the man7090 x year level (man7090_sic3.dta) and 
    wage premiums at the occupation x year level (premium_occ8090.dta) 
    with manufacturing workers (micro_man7090). Collapse the file into 
    an occupation x year level dataset (premium_prices3dig_occ8090.dta)
    and a man7090 x year level dataset (premium_prices3dig_man7090.dta).
 
20. price_data_all.do. We create a set of price indices at the industry 
    level using data from the Bureau of Labor Statistics (BLS) and the 
    US Department of Agriculture (USDA). For most industries, a match is 
    made to the industry-specific BLS Producer Price Index (PPI) data. 
    For the four agricultural industries, we use an industry-specific 
    Producer Price Index from the USDA. For 46 industries, a consistent 
    PPI is not available across time; we instead use a product-specific 
    Consumer Price Index from the BLS. For an additional 46 industries 
    (mainly in the service sector), no consistent PPI or CPI is available 
    for our entire time period; we instead use the economy-wide PPI from 
    the BLS. For eleven of the industries, coverage begins in 1985; 
    we freeze prior years to the 1985 level. The series are simple averages
    across monthly values, and are not seasonally adjusted. The PPI series 
    is �Producer Price Index Revision-Discontinued Series (SIC)� and is 
    available for download at ftp://ftp.bls.gov/pub/time.series/pd/. 
    The CPI series is �Consumer Price Index-All Urban Consumers (Current Series)� 
    and is available for download at ftp://ftp.bls.gov/pub/time.series/cu/. 
    The USDA series is available for download at 
    http://www.nass.usda.gov/Charts_and_Maps/graphics/data/allprpd.txt.

21. prices_man7090.do. Merges sic87-3_man7090.dta with 1digit.dta, 2digit.dta, 
    and 3digit.dta and saves export and import prices at man7090 x year 
    level. Assume that exports or imports are very small if the price indices 
    do not exist. Assume this means zero exposure at the occupation level.
    Assign expfin=0 if expfin is missing and impfin is non-missing, and vice
    versa. Creates man7090_sic1.dta, man7090_sic2.dta, man7090_sic3.dta.

22. switchers.do. Prep for Tables 7 & 8, using match_madrian, switchers data.

23. table1.do. Uses merge_educ_man7090.dta.
    Dependent Variable: Log Difference in Employment Offshored (1983-2002)
    to Low Income Countries and High Income Countries, 
    Import Penetration Difference (1983-2002)
    OLS Estimates of Change in Offshoring and Import Penetration Given Industry Skill Composition in 1983						

24. table2_ind.do. Uses merge_educ_man7090.dta.
    Dependent Variable: Log Wage
    OLS Estimates of Wage Determinants using Occupational versus Industry Offshoring Exposure, 1983-2002 
    Offshoring Measured by Industry-Specific Exposure, Manufacturing Only						

25. table2_occ_tfp.do. Uses offshore_exposure.dta.
    Dependent Variable: Log Wage
    OLS Estimates of Wage Determinants using Occupational versus Industry Offshoring Exposure, 1983-2002 
    Offshoring Measured by Occupation-Specific Exposure, All Sectors							

26. table3.do. Uses offshore_exposure.dta.
    Dependent Variable: Log Wage
    OLS Estimates of Wage Determinants Overall, By Occupational Exposure of a Group to Trade and Offshoring 

27. table4_ind.do. Uses merge_educ_man7090.dta.
    1997-2002 Only, Dependent Variable: Log Wage
    OLS Estimates of Wage Determinants using Occupational versus Industry Offshoring Exposure, 1983-2002 
    Offshoring Measured by Industry-Specific Exposure, Manufacturing Only						

28. table4_occ_tfp.do. Uses offshore_exposure.dta.
    1997-2002 Only, Dependent Variable: Log Wage
    OLS Estimates of Wage Determinants using Occupational versus Industry Offshoring Exposure, 1983-2002 
    Offshoring Measured by Occupation-Specific Exposure, All Sectors							

29. table5.do. Uses merge_educ_man7090.dta.
    Dependent Variable: Log U.S. Manufacturing Sector Employment
    OLS Estimates of Employment Determinants in Manufacturing, 1983-2002 

30. table6.do. Uses match_madrian_wide.dta.
    Dependent Variable: Log Wage Change Between Periods
    Wage Changes Among Manufacturing Workers Observed 2 Periods Who Switch Industry, 1983-2002

31. table7.do. Uses switchers.dta
    Dependent Variable: Log Wage Change Between Periods
    Wage Changes Among All Workers Observed 2 Periods and the Impact of Leaving Manufacturing, 1983-2002	

32. table8.do. Uses switchers.dta
    Dependent Variable: Log Wage Change Between Periods
    Wage Changes Among All Workers Observed 2 Periods by Industry- and Occupation-Specific Exposure to Offshoring, 1983-2002 

33. appendixtable1.do. Uses offshore_exposure.dta and merge_micro_ind7090.dta.
    Summary Statistics of Current Population Survey Merged Outgoing Rotation Group Workers, Means and (Standard Deviations), 1983-2002

34. appendixtableA1.do. Uses merge_educ_man7090.dta.
    Summary Statistics on Industry-Year Cells 

35. appendixtableA2.do. Uses merge_educ_man7090.dta.
    Robustness Checks of Estimates of Employment Determinants, 1983-2002 

36. appendixtableA3.do. Uses merge_educ_man7090.dta.
    Instrumental Variable Estimates of Employment Determinants Overall, 1983-2002

37. appendixtableA4.do. Uses merge_educ_man7090.dta and offshore_exposure.dta.
    Robustness of OLS Estimates of Wage Determinants to using Import Prices instead of Quantities,   
    using Occupational versus Industry Offshoring Exposure, 1983-2002

38. appendixtableA5.do. Uses offshore_exposure.dta.
    Import Penetration in 1983 and 2002, for 40 Occupations with Highest Import Penetration in 2002

Auxiliary:

1. man7090_bea.do. This replaces man7090 with a aggregated version (~40
   categories instead of 68 categories) that allows data to be
   merged with the BEA offshore employment data. Before I run this, I record the original
   man7090 in man7090_orig.dta.

2. keepvars.do, keepvars2.do. Keep subset of variables.

3. industrylist.do, industrylist_short.do, man7090_orig_ind7090.do.
   These files are necessary to assign effective trade and offshore 
   variables to workers using the distribution of occupations across 
   industries.

4. occupationlabels. Defines and attaches labels to occ8090.